Search CORE

22 research outputs found

The shaky foundations of simulating single-cell RNA sequencing data

Author: Crowell Helena L
Morillo Leonardo Sarah X
Robinson Mark D
Soneson Charlotte
Publication venue: BioMed Central
Publication date: 29/03/2023
Field of study

BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons

ZORA

distinct: a novel approach to differential distribution analyses

Author: Helena L Crowell
Lukas M Weber
Mark D Robinson
Pantelis Samartsidis
Simone Tiberi
Publication venue
Publication date: 01/01/2023
Field of study

We present distinct, a general method for differential analysis of full distributions that is well suited to applications on single-cell data, such as single-cell RNA sequencing and high-dimensional flow or mass cytometry data. High-throughput single-cell data reveal an unprecedented view of cell identity and allow complex variations between conditions to be discovered; nonetheless, most methods for differential expression target differences in the mean and struggle to identify changes where the mean is only marginally affected. distinct is based on a hierarchical non-parametric permutation ap- proach and, by comparing empirical cumulative distribution functions, iden- tifies both differential patterns involving changes in the mean, as well as more subtle variations that do not involve the mean. We performed extensive bench- marks across both simulated and experimental datasets from single-cell RNA sequencing and mass cytometry data, where distinct shows favourable per- formance, identifies more differential patterns than competitors, and displays good control of false positive and false discovery rates. distinct is available as a Bioconductor R package

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

An R-based reproducible and user-friendly preprocessing pipeline for CyTOF data

Author: Bodenmiller Bernd
Chevrier Stéphane
Crowell Helena L
et al
Jacobs Andrea
Kölzer Viktor
Robinson Mark D
Sivapatham Sujana
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2020
Field of study

Mass cytometry (CyTOF) has become a method of choice for in-depth characterization of tissue heterogeneity in health and disease, and is currently implemented in multiple clinical trials, where higher quality standards must be met. Currently, preprocessing of raw files is commonly performed in independent standalone tools, which makes it difficult to reproduce. Here, we present an R pipeline based on an updated version of CATALYST that covers all preprocessing steps required for downstream mass cytometry analysis in a fully reproducible way. This new version of CATALYST is based on Bioconductor’s SingleCellExperiment class and fully unit tested. The R-based pipeline includes file concatenation, bead-based normalization, single-cell deconvolution, spillover compensation and live cell gating after debris and doublet removal. Importantly, this pipeline also includes different quality checks to assess machine sensitivity and staining performance while allowing also for batch correction. This pipeline is based on open source R packages and can be easily be adapted to different study designs. It therefore has the potential to significantly facilitate the work of CyTOF users while increasing the quality and reproducibility of data generated with this technology

ZORA

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets [version 3; peer review: 2 approved]

Author: Burkhard Becher
Carsten Krieg
Felix J. Hartmann
Helena L. Crowell
Lukas M. Weber
Malgorzata Nowicka
Mark D. Robinson
Mitchell P. Levesque
Silvia Guglietta
Publication venue: 'F1000 Research Ltd'
Publication date: 01/05/2019
Field of study

High-dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high-throughput interrogation and characterization of cell populations. Here, we present an updated R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signalling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models or linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g., multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g., plots of aggregated signals)

Directory of Open Access Journals

Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability

Author: Al-Ajami Ahmad
Crowell Helena L
Fanaswala Imran
Gerber Reto
Germain Pierre-Luc
Gilis Jeroen
Heidari Elyas
Knyazev Sergey
Luetge Almut
Mallona Izaskun
Mangul Serghei
Milosavljevic Stefan
Paul Dominique
Robinson Mark D
Saeys Yvan
Schmeing Stephan
Seurinck Ruth
Sonder Emanuel
Soneson Charlotte
Sonrel Anthony
Publication venue: BioMed Central
Publication date: 17/05/2023
Field of study

Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, and neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for example, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption

ZORA

Hand2 delineates mesothelium progenitors and is reactivated in mesothelioma.

Author: Brombacher Eline C
Burger Alexa
Clouthier David E
Crowell Helena L
Daetwyler Stephan
Ernst Alexander
Felley-Bosco Emanuela
Firulli Anthony B
Huisken Jan
Kocere Agnese
Kresoja-Rakic Jelena
Labbaf Zahra
Mercader Nadia
Mosimann Christian
Naganathan Sundar R
Nieuwenhuize Susan
O'Rourke Rebecca
Prummel Karin D
Raz Erez
Robinson Mark D
Ronner Manuel
Soneson Charlotte
Sánchez-Iranzo Héctor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/03/2022
Field of study

The mesothelium lines body cavities and surrounds internal organs, widely contributing to homeostasis and regeneration. Mesothelium disruptions cause visceral anomalies and mesothelioma tumors. Nonetheless, the embryonic emergence of mesothelia remains incompletely understood. Here, we track mesothelial origins in the lateral plate mesoderm (LPM) using zebrafish. Single-cell transcriptomics uncovers a post-gastrulation gene expression signature centered on hand2 in distinct LPM progenitor cells. We map mesothelial progenitors to lateral-most, hand2-expressing LPM and confirm conservation in mouse. Time-lapse imaging of zebrafish hand2 reporter embryos captures mesothelium formation including pericardium, visceral, and parietal peritoneum. We find primordial germ cells migrate with the forming mesothelium as ventral migration boundary. Functionally, hand2 loss disrupts mesothelium formation with reduced progenitor cells and perturbed migration. In mouse and human mesothelioma, we document expression of LPM-associated transcription factors including Hand2, suggesting re-initiation of a developmental program. Our data connects mesothelium development to Hand2, expanding our understanding of mesothelial pathologies

REPISALUD

Bern Open Repository and Information System (BORIS)

MPG.PuRe

Hand2 delineates mesothelium progenitors and is reactivated in mesothelioma

Author: Brombacher Eline C
Burger Alexa
Clouthier David E
Crowell Helena L
Daetwyler Stephan
Ernst Alexander
Felley-Bosco Emanuela
Firulli Anthony B
Huisken Jan
Kocere Agnese
Kresoja-Rakic Jelena
Labbaf Zahra
Mercader Nadia
Mosimann Christian
Naganathan Sundar R
Nieuwenhuize Susan
O’Rourke Rebecca
Prummel Karin D
Raz Erez
Robinson Mark D
Ronner Manuel
Soneson Charlotte
Sánchez-Iranzo Héctor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2022
Field of study

ZORA

muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data

Author: Calini Daniela
Collin Ludovic
Crowell Helena L
Germain Pierre-Luc
Malhotra Dheeraj
Raposo Catarina
Robinson Mark D
Soneson Charlotte
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2020
Field of study

Single-cell RNA sequencing (scRNA-seq) has become an empowering technology to profile the transcriptomes of individual cells on a large scale. Early analyses of differential expression have aimed at identifying differences between subpopulations to identify subpopulation markers. More generally, such methods compare expression levels across sets of cells, thus leading to cross-condition analyses. Given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making sample-level inferences, termed here as differential state analysis; however, it is not clear which statistical framework best handles this situation. Here, we surveyed methods to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated pseudobulk data. To evaluate method performance, we developed a flexible simulation that mimics multi-sample scRNA-seq data. We analyzed scRNA-seq data from mouse cortex cells to uncover subpopulation-specific responses to lipopolysaccharide treatment, and provide robust tools for multi-condition analysis within the muscat R package

ZORA

The shaky foundations of simulating single-cell RNA sequencing data

Author: Charlotte Soneson
Helena L. Crowell
Mark D. Robinson
Sarah X. Morillo Leonardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2023
Field of study

Abstract Background With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. Results Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. Conclusions Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons

Repository for Publications and Research Data

Directory of Open Access Journals

SpatialExperiment: infrastructure for spatially resolved transcriptomics data in R using Bioconductor

Author: Collado-Torres Leonardo
Crowell Helena L
Ghazanfar Shila
Hicks Stephanie C
Lun Aaron T L
Pardo Brenda
Righelli Dario
Risso Davide
Weber Lukas M
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2022
Field of study

Summary: SpatialExperiment is a new data infrastructure for storing and accessing spatially resolved transcriptomics data, implemented within the R/Bioconductor framework, which provides advantages of modularity, interoperability, standardized operations, and comprehensive documentation. Here, we demonstrate the structure and user interface with examples from the 10x Genomics Visium and seqFISH platforms, and provide access to example datasets and visualization tools in the STexampleData, TENxVisiumData, and ggspavis packages. Availability and implementation: The SpatialExperiment, STexampleData, TENxVisiumData, and ggspavis packages are available from Bioconductor. The package versions described in this manuscript are available in Bioconductor version 3.15 onwards. Supplementary information: Supplementary tables and figures are available at Bioinformatics online

PubMed Central

Archivio istituzionale della ricerca - Università di Padova